REMARKS 



Claims 1-67 remain pending in the application. Reconsideration is respectfully 
requested in light of the following remarks. 

Section 103(a) Rejections ; 

The Examiner rejected claims 1, 4-10, 19, 21-26, 36, 38-42, 48, 52-60, 62, 66 and 
67 under 35 U.S.C. § 103(a) as being unpatentable over Huppenthal, et al. (U.S. Patent 
6,339,819) (hereinafter "Huppenthal") in view of Hinds, et al. (U.S. Patent 6,542,916) 
(hereinafter "Hinds"), and further in view of Chen et al. (U.S. Patent 6,763,365) 
(hereinafter "Chen") claims 2, 3, 15-18, 27-29, 35 and 43-46 as being unpatentable over 
Huppenthal, Hinds and Chen, and further in view of Lasher et al. (U.S. Patent 4,863,247) 
(hereinafter "Lasher"), claims 11, 20, 30, 31, 37, 47 and 61 as being unpatentable over 
Huppenthal, Hinds and Chen, and ftirther in view of Stribaek et al. (U.S. Patent 
7,181,484) (hereinafter "Stribaek"), and claims 12-14, 32-34, 49-51 and 63-65 as being 
unpatentable over Huppenthal, Hinds and Chen, and further in view of Chen et al. (U.S. 
Patent 6,687,725) (hereinafter "Chen2"). Applicants respectftiUy traverse these rejections 
for at least the following reasons. 

In the Response to Arguments section of the present Office Action, the Examiner 
submits that Applicants' Specification (on page 8, at paragraph [1056], lines 6-8; and on 
page 15, at paragraph [1076], lines 6-8]) discloses "the concept of the multiply- 
accumulate-chaining operation. The operation chaining concept discloses the chaining of 
operations in order to perform successive arithmetic operations in response to a single 
initiated instruction. The arithmetic operations can be multiplication, accumulation, 
carry-save, partial multipUcation, and etc." The Examiner is clearly mischaracterizing 
the teachings of Applicants' Specification. 

Paragraph [1056] of Applicants' Specification actually states, in its entirety, the 
following: 
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"Multi-word multiplications and additions may be computed using many word- 
sized multiply, add, and shift operations. As shown in FIG. 1, a 1024-bit integer 
X, for example, can be represented with sixteen 64-bit words X=(xl5, . . . , xl, 
xO). That form is commonly referred to as multi-precision representation. The 
more efficiently a multi-word operation can be performed, the better the public- 
key performance will be. Adding capabilities to a general purpose processor to 
speed up multi-word operations is a key to accelerating pubhc-key computations, 
and add-chaining and multiply-chaining are two of the capabilities needed." 

This passage describes that adding add-chaining and multiply-chaining 
capabihties to a general purpose processor (such as by adding the new processor 
instructions of Applicants' claimed invention) will speed up multi-word operations , and 
therefore, can be used to accelerate public-key computations, which typically include 
such multi-word operations. This passage does not describe "multiply-accumulate- 
chaining," or "the chaining of operations in order to perform successive arithmetic 
operations in response to a single initiated instruction." In fact, it is not clear what the 
Examiner means by the phrase "the chaining of operations in order to perform successive 
arithmetic operations in response to a single initiated instruction." Applicants' claims 
describe a relationship between two successive single arithmetic instructions , not just a 
collection of arithmetic operations that are performed in response to a single initiated 
instruction. 

Paragraph [1076] of Applicants' Specification actually states, in its entirety, the 
following: 

The umulxck instruction is an efficient way to support public-key computations. 
Back-to-back scheduling of multi-word multiplications and accumulations is often 
difficult using today's instruction sets due to the multiplier latency. The umulxck 
instruction performs the multiply-accumulate-chaining operation which combines 
add-chaining and multiply-chaining in one operation and avoids the multiplier 
latency. Using the umulxck instruction, and referring again to FIG. 5, the 
calculation of a 64x1024 bit partial product (yO*X) and the accumulation with a 
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previous partial product (sl6:s0), can be accomplished in the following 20 
instructions: 



set register k=yO; 

umulxck 0,0; 

rO = umulxck xO, sO; 



// clear extended-carry register exc first 



rl5 = umulxck xl5, sl5; 



rl6 = umulxck 0, si 6; 
rl7 = umulxck 0, 0; 



// catch 64 carryout bits 
// catch last carryout bit 



As described in Applicants' specification, each single umulxck instruction in this 
example performs both multiply and accumulate operations and also implicitly allows for 
accumulating an additional partial result of a previous single umulxck instruction , i.e. 
without requiring additional add operations or specifying an additional operand for the 
additional accumulate operation. Specifically, a umulxck instruction of the form shown 
above (rd = umulxck rsl, rs2 - where rsl and rs2 are source operands and rd is a 
destination operand), performs (rsl*k + rs2 + a partial result of a previous single 
arithmetic instruction, such as a previous umulxck instruction) without explicitly 
specifsnng the additional operand (the partial result of the previous single arithmetic 
instruction). The result register specified by operand rd receives the lower order bits of 
the result of this calculation, and the extended carry register (which is also not specified 
by any of the operands of the instruction) receives the upper order bits of the result of this 
calculation. 

In the example referenced by the Examiner, the third single arithmetic instruction 
(rO = umulxck xO, sO): 

performs the calculations (xO*k + sO + exc), where exc = 0 (because the second 
instruction above cleared exc) 
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stores the lower order bits of the resuh of this calculation in rO; and 
stores the upper order bits of the result of this calculation in exc. 

In this example, the fourth single arithmetic instruction (not shown, but implied to 
be (rl = umulxck xl, si): 

performs the calculations (xl*k + si + exc), where exc now contains the upper 

order bits of the result of the calculation performed for the third 

instruction, as noted above : 
stores the lower order bits of the result of this calculation in rO; and 
stores the upper order bits of the result of this calculation in exc. 

Similarly, each successive single umulxck instruction in this example performs a 
similar calculation on the explicitly specified source operands and also implicitly adds the 
contents of exc that were stored in exc bv the previous umulxck instruction as the high 
order bits of the calculation made for that previous umulxck instruction . 

As described in detail above, the Examiner's characterization of the teachings of 
Applicants' Specification is clearly incorrect. The cited portions of Applicants' 
Specification describe a specific processor instruction (umulxck). When a single instance 
of this specific processor instruction is executed by the processor, the processor performs 
specific multiply and accumulate calculations, including the implicit addition of a partial 
result of a previously executed single instance of the umulxck instruction . In other 
words, as noted above, execution of the processor instructions recited in Applicants' 
claims facilitates a feedback relationship between two successive single arithmetic 
instructions , not merely a collection of arithmetic operations that are performed in 
response to a single initiated instruction, or a general concept of operation chaining in 
performing a single arithmetic instruction. Because execution of these instructions 
facilitates this feedback relationship, the chaining of successive single instances of this 
umulxck instruction can be used to perform (and accelerate) the multi-word multiplies 
and accumulates that are common in cryptography applications. 
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In the Response to Arguments section of the present Office Action, the Examiner 
also submits, "The Huppenthal prior art discloses architecture for chaining a very large 
number of arithmetic operations (such as multiplication, accumulation, and etc) to form a 
single arithmetic instruction. The single arithmetic instruction is initiated and the entire 
sequence of chained instructions is performed with operand transfer controlled by the 
architecture via the usage of a chain port mechanism. The chain port mechanism supplies 
operands to each successive arithmetic operation in the sequence. The chain mechanism 
supplies operands without any support from the processor. The Hinds prior art discloses 
multiply-accumulate arithmetic operation sequences and its combination with Huppenthal 
discloses these arithmetic operational sequences completed as a single arithmetic 
instructions." Applicants assert that the Examiner is clearly mischaracterizing the 
teachings of both Huppenthal and Hinds in suggesting that the combination teaches 
Applicants' claimed invention. 

Huppenthal is directed to a multiprocessor computer architecture incorporating a 
number of memory algorithmic processors ("MAP") in the memory subsystem or closely 
coupled to other processing elements (e.g., one or more fixed instruction set 
microprocessor-based processors) to enhance overall system processing speed. The 
memory algorithmic processor architecture is an assembly that contains field 
programmable gate arrays (FPGAs) functioning as the memory algorithmic processors. 
In other words, each memory algorithmic processor includes one or more FPGAs that are 
configurable to perform various functions that arc not performed by one of the 
instructions of the fixed instruction microprocessor-based processors in the system . As 
described in the Summary section of Huppenthal, a MAP element can function 
autonomously from its host system (i.e. without processor intervention) once its operands 
have been loaded . MAP elements (which are not processor instructions, nor do they 
represent circuitry to execute arithmetic instructions of a microprocessor instruction set ) 
can be chained together to forward operands from one to another when multiple MAP 
elements are working together to perform a very large function. As described in detail in 
Huppenthal, the MAP architecture, and FPGAs thereof, perform functions in response to 
commands written to them using a write instruction that specifies a command and 
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operands in the data. This is clearly not analogous to the limitations of Applicants' 
claims, which require circuits for executing single arithmetic instructions of a processor 
instruction set . For example, Table 2 of Huppenthal (and the accompanying description) 
describe commands that are communicated to the MAP architecture by issuing a write 
instruction from processor 12. These commands are used to configure various elements 
of the MAP architecture (e.g., RMB, RUC, RECON, LDROM), to pass operands to the 
MAP architecture (e.g., WRTOP, LASTOP), and to initiate the performance of a function 
implemented in the MAP circuitry (e.g., START). 

While performing a function using the MAP circuitry of Huppenthal may 
involve performing a series of calculations or other functions using one or more 
FPGAs, this teaches absolutely nothing about Applicants' claimed invention. 

Applicants' claims are not directed to the general concept of performing a collection of 
arithmetic operations to carry out a single arithmetic instruction, as the Examiner implies, 
or to the chaining of operations performed within a MAP architecture or FPGA in 
response to a command issued using one or more write instructions of a processor, but to 
specific arithmetic instructions in a processor's instruction set , individual instances of 
which implicitly add partial resuks of a previously executed single arithmetic instruction 
when executing a current single arithmetic instruction without explicitly specifying the 
partial result as an operand of the current single arithmetic instruction. 

To establish a prima facie obviousness of a claimed invention, all claim 
limitations must be taught or suggested by the prior art. In re Royka, 490 F.2d 981, 180 
U.S.P.Q. 580 (C.C.P.A. 1974), MPEP 2143.03. The cited art does not teach or suggest 
all limitations of the currently pending claims, some distinctive limitations of which are 
set forth in more detail below. 

Regarding claim 1, the cited art fails to teach or suggest a first arithmetic circuit 
comprising a first plurality of arithmetic structures feeding back high order bits of a 
previously executed single arithmetic instruction of a processor instruction set in the 
public-key cryptography application, generated by the first arithmetic circuit, to a second 
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arithmetic circuit comprising a second plurality of arithmetic structures and the second 
arithmetic circuit generating a first partial result of a currently executing single 
arithmetic instruction of the processor instruction set in the public-key cryptography 
application, wherein the currently executing single arithmetic instruction does not 
include an explicit source operand for specifying the high order bits, the first partial 
result representing the high order bits summed with low order bits of a result of a first 
number multiplied by a second number, the summing of the high order bits being 
performed during multiplication of the first number and the second number, the summing 
and at least a portion of the multiplication being performed in the second arithmetic 
circuit. 

The Examiner submits, "Huppenthal discloses a method implemented in a device 
and storing the first partial result; and using the stored first partial result in a subsequent 
computation in the public-key cryptography application, the method comprising: a 
previously executed single instruction of a processor set and wherein the currently 
executing single arithmetic instructions does not include an explicit source operand for 
specifying the high order bits " (emphasis Examiner's). Applicants first assert that the 
Examiner has piclced and chosen individual words and partial phrases taken from 
Applicants' claim in his remarks, completely ignoring the context in which they are 
used in the claim. For example, the phrase "a method implemented in a device and 
storing the first partial result" has no meaning in the Examiner's remarks, as no partial 
result (or anything that could produce a result or partial result) is referenced by the 
Examiner. Similarly, the phrase "using the stored first partial result in a subsequent 
computation in the public-key cryptography application" has no meaning in the 
Examiner's remarks, since the Examiner has not identified a first computation with 
respect to the recited "subsequent computation." In addition, the phrase, "the method 
comprising: a previously executed single arithmetic instruction of a processor set" is 
incomprehensible, as it does not describe a method step at all. Finally, the phrase 
"wherein the currently executing single arithmetic instructions does not include an 
explicit source operand for specifying the high order bits" has no meaning in the 
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Examiner's remarks, since no such single arithmetic instructions or high order bits are 
referenced in his remarks. 

The Examiner cites Huppenthal (in column 3, lines 1-7) as disclosing "number of 
MAP elements chained together to accomplish a single function or operation (implies a 
single arithmetic instruction)." The Examiner's remarks regarding tlie implication of 
a single arithmetic instruction of a processor instruction set are completely 
unsupported by the reference itself. In fact, as discussed above, the memory 
algorithmic processors taught by Huppenthal are explicitly disclosed as being separate 
from the fixed instruction set microprocessors in the system (such as processors 12i - 
12n). Instead, they act as co-processors implemented in FPGAs or another type of user 
array to perform algorithmic functions in response to commands issued to them using a 
write instruction of the fixed instruction set processor, which is clearly not an arithmetic 
instruction of the fixed instruction set processor. 

The Examiner cites Huppenthal (in column 3, lines 18-25) as disclosing, "MAP 
elements can receive operands via chained port)" and (in column 18, line 66 - column 19, 
line 3) as disclosing, "output data fi-om one MAP element to be sent directly to the user 
array of the next MAP element with no processor intervention via a chain port." 
Applicants again assert that the chaining of MAP elements to perform a large fimction 
coded in FPGAs and initiated by a write instruction teaches nothing about the limitations 
of Applicants' claim. For example, it docs not teach circuitry that feeds back partial 
results of one arithmetic instruction of a processor's instruction set to circuitry executing 
another arithmetic instruction of the processor's instruction set . The chaining of MAP 
elements and passing of operands between MAP elements over a chain port when 
performing a single large operation implemented in one or more FPGAs, as described by 
Huppenthal, teach nothing about feedback between arithmetic instructions of a 
processor's instruction set . 

The Examiner admits that Huppenthal does not specifically disclose a multiplying 
and summing sequence and relies on Hinds to teach such a sequence. Applicants again 
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note that claim 1 does not merely require "a multiplying and summing sequence" but also 
requires feedback between two different arithmetic instructions. The Examiner submits 
that Hinds discloses a first arithmetic circuit comprising a first plurality of arithmetic 
structures feeding back high order bits of an executed arithmetic instruction in the public- 
key cryptography application, generated by the first arithmetic circuit, to a second 
arithmetic circuit comprising a second plurality of arithmetic structures and the second 
arithmetic circuit generating a first partial result of a currently executing arithmetic 
instruction in the public-key cryptography application, the first partial result representing 
the high order bits summed with low order bits of a result of a first number multiplied by 
a second number, the summing of the high order bits being performed during 
muhipHcation of the first number and the second number, the summing and at least a 
portion of the multiphcation being performed in the second arithmetic circuit. Again, the 
Examiner has selectively ignored some of the words explicitly recited in the 
limitations of Applicants' claim, changing the context and meaning of these 
limitations. For example, claim 1 does not merely recite "feeding back high order bits of 
an executed single arithmetic instruction" but requires circuitry for "feeding back high 
order bits of a previously executed single arithmetic instruction of a processor instruction 
set." Similarly, claim 1 does not merely recite "the second arithmetic circuit generating a 
first partial result of a currently executing arithmetic instruction in the public-key 
crj^tography application" but requires the second arithmetic circuit generating a first 
partial result of a currently executing single arithmetic instruction of the processor 
instruction set and that tlic currently executing single arithmetic instruction does not 
include an explicit source operand for specifying the high order bits . 

The Examiner cites Hinds (in column 3, lines 5-17) as disclosing, "applying a 
multiply-accumulate operation to operands; multiplying operands and adding 
multiplication results" and (in column 8, lines 6-25) as disclosing, "chained multiply- 
accumulate operation; carry save adders and usage of partial multiplier (partial results: 
high order bits); output of results of partial multiplier is normalized and passed to carry 
save adders and final product adder)." Hinds is directed to an apparatus and method for 
implementing a floating point MAC instruction that takes three operands as inputs. The 
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cited portions of Hinds describe the operation of an individual floating point MAC 
instruction , which includes both a multiplication and an addition operation. As described 
in Hinds, all three operands are explicitly specified for the MAC instruction; there is no 
additional operand that is implicitly added during execution of Hinds' MAC instruction. 
The Examiner's citation in column 8 refers to a prior art "chained" multiply-accumulate 
FPU, illustrated in FIG. 3B, which is operable to implement a single floating point 
multiply-accumulate operation (A+(B*C)) or floating-point multiply-subtract operation (- 
A+(B*C)) in response to issuance of a single floating-point instruction. This "chained" 
multiply-accumulate FPU passes results of a partial multiplier to a carry save adder and 
final product adder as part of executing a single floating point MAC instruction . This 
partial result is not implicitly fed back to a circuit implementing a subsequent arithmetic 
instruction , as in Applicants' claimed invention, nor is it fed back to a circuit 
implementing the current instruction from a circuit that generated a partial result of a 
previously executed instruction. 

Applicants note that the Examiner has not provided any reason to combine the 
Huppcnthal and Hinds references. As stated in KSR Int'l Co. v. Teleflex Inc., No. 04- 
1350, slip. op. at 14 (U.S. Apr. 30, 2007), "rejections on obviousness grounds cannot be 
sustained by mere conclusory statements; instead, there must be some articulated 
reasoning with some rational underpinning to support the legal standard of obviousness." 
The Examiner must show that "there was an apparent reason to combine the known 
elements in the fashion claimed." Id. The Examiner's analysis "should be made explicit." 
Id. Since the Examiner has not articulated any reason to combine the references, the 
rejection is improper. 

In addition, since the floating point MAC instruction of Hinds does not implicitly 
add a partial result from the previous execution of another floating point MAC (or a 
partial result from the previous execution of any other arithmetic instruction), as 
suggested by the Examiner and as required by Applicant' claim, the combination of 
Huppenthal and Hinds clearly would not result in Applicants' claimed invention. At 
most, it could result in a system in which the floating point MAC instruction of Hinds 
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(which does not include such feedback) is implemented algorithmically in one or more 
FPGAs of the MAP architecture of Huppenthal, e.g., if such an instruction is not included 
in the instruction set of the fixed-instruction processors of Huppenthal. 

The Examiner admits that Huppenthal-Hinds does not specifically disclose 
supporting a cryptography application and relies on Chen to disclose supporting a public- 
key cryptography apphcation (citing Chen, column 6, lines 23-25, and "arithmetic 
operations to support acceleration of cryptographic functions"). The Examiner states, "It 
would have been obvious to one of ordinary skill in the art to modify Huppenthal-Hinds 
to support a cryptographic application as taught by Chen. One of ordinary skill in the art 
would have been motivated to employ the teachings of Chen to greatly improve the 
performance of cryptographic circuits. (Chen col. 5, lines 40-42)." The Examiner seems 
to imply that the reason to modify Huppenthal-Hinds to support cryptographic 
applications is to improve the performance of something that apparently does not exist in 
Huppenthal-Hinds (cryptographic circuits) for something that the Examiner admits that 
Huppenthal-Hinds does not support (i.e. cryptographic applications). Therefore, the 
Examiner has not provided a valid reason to combine the references. In addition, as 
described in detail above, the cited art does not teach all of the limitations of Applicants' 
claims, whether taken alone or in combination. 

For at least the reasons stated above. Applicants assert that the Examiner has 
failed to establish a prima facie rejection of claim 1. 

Independent claims 38 and 66 include limitations similar to those recited in claim 
1 and discussed above, and were rejected for similar reasons. Therefore, the arguments 
presented above apply with equal force to these claims, as well. 

Independent claim 21 includes limitations similar to those recited in claim 1 and 
discussed above, and was rejected for reasons similar to those discussed above regarding 
claim 1 . Therefore, Applicants traverse the rejection of this claim for at least the reasons 
presented above regarding limitations in this claim that are similar to those in claim 1 . 



10/789,311 (6000-31500/SUN030132) 



12 



Meyertons, Hood, Kivlin, Kowert & Goetzel., P.C. 



In addition, independent claim 21 recites supplying a third number to the second 
arithmetic circuit and the first partial result being a representation of the high order bits 
summed with low order bits of a result of a first number multiplied by a second number 
and with the third number. The Examiner submits that Hinds discloses these limitations 
using the same citations and reasoning as those included in remarks directed to claim 1 . 
However, as discussed in detail above. Hinds teaches a floating point MAC instruction of 
the form (A+(B*C)), for which operands A, B, and C are explicitly specified. Hinds does 
not disclose a MAC instruction that feeds back partial results of a previously executed 
instruction, the first partial result being a representation of the high order bits summed 
with low order bits of a result of a first number multiplied by a second number and with 
the third number, as required by claim 21 . 

For at least the reasons above, the rejection of claim 21 is unsupported by the 
cited art and the removal thereof is respectfully requested. 

Claims 53 and 67 include limitations similar to those recited in claims 1 and 21 
and discussed above, and were rejected for the same reasons as claims 1 and 21. 
Therefore, the arguments presented above apply with equal force to these claims, as well. 

Applicants assert that numerous ones of the dependent claims recite fiirther 
distinctions over the cited art. Applicants traverse the rejection of these claims for at 
least the reasons given above in regard to the claims from which they depend. However, 
since the rejections have been shown to be unsupported for the independent claims, a 
discussion of the dependent claims is not necessary at this time. Applicants reserve the 
right to present additional arguments. 
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CONCLUSION 



Applicants submit the application is in condition for allowance, and an early 

notice to that effect is requested. 



If any fees are due, the Commissioner is authorized to charge said fees to 
Meyertons, Hood, Kivlin, Kowert, & Goetzel, P.C. Deposit Account No. 501505/6000- 
31500/RCK. 



Respectfully submitted, 

/Robert C. Kowert/ 

Robert C. Kowert, Reg. #39,255 
Attorney for Applicant(s) 



Meyertons, Hood, Kivlin, Kowert, & Goetzel, P.C. 

P.O. Box 398 

Austin, TX 78767-0398 

Phone: (512) 853-8850 

Date: May 4. 2010 
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